© John Wiley & Sons, Inc.
FIGURE 12-6: A general way of naming the cells of a cross-tab table.
Using these conventions, the basic formulas for the Pearson chi-square test are as follows:
Expected values:
Chi-square statistic:
Degrees of freedom:
where i and j are array indices that indicate the row and column, respectively, of each cell.
Pointing out the pros and cons of the chi-square test
The Pearson chi-square test is very popular for several reasons:
It’s easy! The calculations are simple to do manually in Microsoft Excel (although this is not
recommended because the risk of making a typing mistake is high). As described earlier, statistical
software packages like the ones discussed in Chapter 4 can perform the chi-square test for both
individual-level data as well as summarized cross-tabulated data. Also, several websites can
perform the test, and the test has been implemented on smartphones and tablets.
It’s flexible! The test works for tables with any number of rows and columns, and it easily handles
cell counts of any magnitude. Statistical software can usually complete the calculations quickly,
even on big data sets.
But the chi-square test has some shortcomings:
It’s not an exact test. The p value it produces is only approximate, so using
as your
criterion for statistical significance (meaning setting α = 0.05) doesn’t necessarily guarantee that
your Type I error rate will be only 5 percent. Remember, your Type I error rate is the likelihood
you will claim statistical significance on a difference that is not true (see Chapter 3 for an
introduction to Type I errors). The level of accuracy of the statistical significance is high when all